Automatic Phoneme Segmentation with Relaxed Textual Constraints
نویسندگان
چکیده
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, e.g. Hidden Markov Models (HMM), can be optimised for maximum segmentation accuracy. This paper presents the results of tuning such a phoneme segmentation system. Firstly, using no text transcription, the design of an HMM phoneme recogniser is optimised subject to a phoneme bigram language model. Optimal performance is obtained with triphone models, 7 states per phoneme and 5 Gaussians per state, reaching 94.4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70 ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi-pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96.8% with 96.1% of phoneme boundaries within 70 ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvements are proposed.
منابع مشابه
Unsupervised Phoneme Segmentation Using Transformed Cepstrum Features
One of the basic problems in speech engineering is phoneme segmentation, that is, to divide a speech stream into a string of phonemes. Automatic Speech Recognition (ASR) models often require reliable phoneme segmentation in the initial training phase, and Text-to-Speech (TTS) systems need a large speech database with correct phoneme segmentation information for improving the performance. Human ...
متن کاملStatistical corpus-based speech segmentation
An automatic speech segmentation technique is presented that is based on the alignment of a target speech signal with a set of different reference speech signals generated by a specific designed corpus-based speech synthesis system that additionally generates phoneme boundary markers. Each reference signal is then warped to the target speech signal. By synthesizing and warping many different re...
متن کاملFrom Syntactical Analysis to Textual Segmentation
In this work a proposal for automatic textual segmentation is described. The proposal uses the output of an automatic syntactic analyzer – Parser Palavras – to create textual segmentation. Parse trees are used to infer text segments and a dependency tree of the identified segments. The main contribution of this work is the use of the syntactic structure as source for the automatic segmentation ...
متن کاملLyrics-to-audio Alignment and Phrase-level Segmentation Using Incomplete Internet-style Chord Annotations
We propose two novel lyrics-to-audio alignment methods which make use of additional chord information. In the first method we extend an existing hidden Markov model (HMM) for lyrics alignment [1] by adding a chord model based on the chroma features often used in automatic audio chord detection. However, the textual transcriptions found on the Internet usually provide chords only for the first a...
متن کاملA constrained baum-welch algorithm for improved phoneme segmentation and efficient training
We describe an extension to the Baum-Welch algorithm for training Hidden Markov Models that uses explicit phoneme segmentation to constrain the forward and backward lattice. The HMMs trained with this algorithm can be shown to improve the accuracy of automatic phoneme segmentation. In addition, this algorithm is significantly more computationally efficient than the full BaumWelch algorithm, whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008